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About This Guide 


This guide is divided into three sections: 


Simulating processor pipelines explains the purpose of the Simulator, the 
processors it supports, the processor pipelines, and common causes of pipeline 
stalls. 


Running the Simulator and interpreting the results tells you how to run the 
Simulator and explains what the target and your computer do during simulation. 
It describes the Simulator’s display and a typical simulation including an example 
routine. The example shows how a function can optimized and most pipeline stalls 
can be avoided by re-ordering the sequence of assembly instructions. 


Appendixes that: list the Simulator’s Status Bar messages, explain how to use the 
Simulator’s shortcut menu, and describe CodeScape’s key debugging options. 


NOTE: The example Simulator files and the original C++ code referred to in this guide are 
included on the release CD in the Tutorials\SH4Simulator directory. 


Simulating Processor Pipelines 


This CodeScape pipeline Simulator is for optimizing code that runs on Hitachi SH4 
microprocessors. The Simulator helps you optimize timing critical sections of code by 
highlighting pipeline operations that stall the flow of instruction execution. 


SH4’s are superscalar pipelining microprocessors that can execute two instructions in parallel. 
The execution cycles depend on the processor version, for more information refer to your 
Hitachi SuperH™ (SH) 32-Bit RISC MCU/MPU Series Hardware Manual. 


This guide assumes that you fully understand SH4 internal architecture. Refer to your Hitachi 
SuperH™ (SH) 32-Bit RISC MCU/MPU Series Hardware Manual for more information. 


This section describes SH4 pipeline operation including the pipeline stages and how they 
execute. It also describes the three common causes of pipeline stalls. 
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SH4 pipelines and their stages 


SH4’s have seven pipelines: General, General Load/Store, Special, Special Load/Store, Floating 
Point, Floating Point Extended, and FDIV/FSQTR. Each pipeline stage has a specific task that 
can run in parallel with other stages of the same pipeline and with some stages of other 
pipelines. Overlapping instructions by running pipeline stages in parallel increases processor 
throughput. The stages of each pipeline follow. 


NOTE: Processing many instructions in parallel can cause complex pipeline behaviour. 


General Pipeline 


Fetch Instruction decode — Operation Non-memory data Write-back 
Issue access 


Register read 


Destination address 
calculation for PC 
relative branch. 


General Load/Store Pipeline 


a ee 


Fetch Instruction decode = Address calculation Memory data access Write-back 
Issue 
Register read 


Special Pipeline 


a Se ee ee 


Fetch Instruction decode —_ Operation Non-memory data _—_ Write-back 
Issue access 


Register read 


Special Load/Store Pipeline 


Fetch Instruction decode = Address calculation Memory data access Write-back 
Issue 
Register read 


Floating Point Pipeline 


OO 
Fetch Instruction decode = Computation 1 Computation 2 Computation 3 


Issue Write-back 
Register read 


ws 
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Floating Point Extended Pipeline 


[1 [| >» | » | " | ® [| ®_ 


Fetch Instruction decode Computation 0 Computation 1 Computation 2 Operation 3 
Issue Write-back 
Register read 

FDIV/FSQTR Pipeline 


Computation takes several clock cycles 


NOTE: Although the FDIV/FSQTR Pipeline is described as a separate pipeline it is only 
used as a sub-pipeline of the Floating Point Pipeline. 


Example pipeline operations 


Assume that the op-codes (see Figure 1) are processed in the General Pipeline and that each 
operation takes one clock cycle. On the first clock cycle the instruction at address 0x0000 is 
fetched from memory and on the fifth clock cycle five instructions are processed concurrently. 


The instruction at address: 
¢ — 0x0000 writes the result back to register. 
* 0x0002 performs ‘non-memory operations address’. 
¢ 0x0004 performs the addition operation. 
¢ 0x0006 is decoded and issued, and R3 is read. 
e 0x0008 is fetched. 


Figure 1: Instruction execution during five clock cycles 


| i 
| [adaress [instruction [1 [2 [3 |4 [5 |6 [7 [s | | | 
| [oxoo0 [appa [1 _|o [ex [nafs | | | {| | | | 
|| oxoo2 appann | 


| foxovos fappene | 
|| oxoos | app #3 
| | oxooos app wie 
es eee ee 
| Joxooe | ADD #16 
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Initial stages of pipeline execution 


The two initial stages of pipeline execution “Fetch and Decode” are common to all instructions. 
In a single clock cycle, the SH4 fetches two instructions, and can decode up to two previously 
fetched instructions. In some conditions it cannot fetch two instructions, these include branch 
destinations not set on 32-bit bounda. ies. 


Decode rate 


The decode rate is the number of instructions that are decoded and successfully issued. It is 
limited by the type of instructions in the pipeline and any resource requirements. An instruction 
type is dependent on the pipeline used. See “SH4 pipelines and their stages” on page 4. 


A dual issue occurs when two instructions decode and issue simultaneously. Some instructions 
cannot dual issue and cause pipeline stalls. An instruction cannot dual issue if: 


e — It has a decode rate that does not allow dual issues. 
e It has a resource dependency problem. 
e Another instruction locks a required stage in the pipeline. 


SH4 instructions are grouped in six groups: MT, EX, BR, LS, FE, and CO. Table 1 shows the 
instructions that can dual issue. 


Table 1: SH4 instructions and dual issue restrictions 


Second instruction 


| [Mr Bx BR LS FE CO 


e 
= 
~ 

o 

= 

pa 
~ 

n 
= 
~ 

n 
i= 

Lo 


NOTE: A v indicates that the instruction pair can dual issue providing there are no 
resource dependencies. 
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Example decode rate stall 


The following code section shows a series of LS and EX instructions with the issue tests 
highlighted. Time Slot 4 shows a pipeline stall because two LS instructions were fetched in Time 
Slot 3 but only one may be dispatched. 


Figure 2: Decode rate stall 


Address _| Instruction type | 1 | 
0x0000 MOV @r3, 18 ram 
S 


om . 

| | oxoo04 | Movert.ro | 1s _| 
|| oxoo0s | appai.es LEX | 
| 
| 
La 
|| 
Leal! 


E 
, || 
ae 
0x0008 MOV @r3, r10 L D Ex | Ma |S 
UE Na _ 
| | | ft [x 
apes tex | [| | |i |b [ex [na [s | 


NOTE: Time Slots represent the number of clock cycles that have occurred at that point. 
For example, at Time Slot 4, four clock cycles have occurred. For more information 
on Time Slots see“Information Displayed” on page 12. 
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Resource dependency 


Resource dependency is a common cause of pipeline stalls. Resource dependency stalls occur 
when: 


e — An instruction needs a resource to complete an operation and cannot release the 
resource until the operation is complete. 


e The resource is also required by another instruction that must wait for the 
resource to be released before it can execute. 


Example resource dependency stall 


EX and LS instructions normally dual issue but do not in this example due to a resource 
dependency stall. The stall occurs because R1 contains the result of the ADD but the value is 
not available for the MOV to issue until Time Slot 3. 


Figure 3: Resource dependency stall 


1 latency stall 


The example in Figure 3 only shows a short stall, resource dependencies can also cause very 
long stalls. Figure 4 shows that the instruction FADD DRm, DRn could stall the next 
instruction’s decode by up to 9 Time Slots. 


Examine the instruction latency 


When trying to resolve a resource dependency stall, examine the instruction latency to discover 
when an operation completes generating the required data. The latency is the number of Time 
Slots after an instruction is issued before the data is available. 


Refer to your Hitachi SuperH™ (SH) 32-Bit RISC MCU/MPU Series Hardware Manual for more 
information. Figure 4 is from the manual and includes the definition for ADD. (ADD has a 
latency of 1.) 
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Figure 4: Instruction latency of ADD 
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ADD Rm, Rn - 
MOV. @Rm, Rn sfx fe fe 


MOV.L @Rm+, Rn 
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These instructions show special cases: 


¢ MOV.L @Rm+, Rn has two latency values, one for each of the two registers it 
changes (Rm + 4 takes 1, the loaded value takes 2). 


¢ MOV Rm, Rn has a latency of 0, this implies that its result is available 
immediately. 


¢ FADD DRm, DRn has two values in brackets. The values define the delay on the 
two halves of the 64-bit value. 


¢ — Conditional branches have two possible values. The value used is dependent on 
whether the branch is taken or not. 
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Locked stage 

Some instructions require extensive use of certain stages of the pipeline. This can cause 
instructions later in the pipeline to stall. 

Example locked stage stall 


The CLRMAC instruction locks the F1 stage of the floating-point pipelines for two cycles. Stalls 
of this type are unusual! but removing them substantially increases performance. 


Figure 5: Locked stage stall 
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Running the Simulator and 
Interpreting the Results 


When you run the Simulator, CodeScape first reads the target processor's cache to your 
computer where it is maintained until you close the Simulator. The Simulator uses the target’s 
memory and registers as the source and destination of each assembly instruction and switches 
CodeScape from debugging mode to simulation mode. Switching to simulation mode halts the 
target and processor, and all execution operations such as run, stop, and trace are simulated on 
your computer. 


To run the Simulator: 
° Click Tools, Simulate Processor. 
-OR- 


* Configure a breakpoint’s Trigger Actions. In the Configure Breakpoint(s) dialog 
select Cause processor simulation to start to tell the Simulator to run when the 
breakpoint has triggered. 


See “Configure a breakpoint” on page 31. 


On close, the Simulator executes any remaining instructions in the simulated pipeline then 
returns CodeScape to normal debugging on the target. However, the cache is not restored. 


NOTE: Use the Simulator’s shortcut menu commands to configure the Simulator and 
access the debugging functions. 
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Information Displayed 


When you simulate a function, the simulated pipeline’s operation is described in the Simulator 
output regions. The Simulator output shows the flow of instruction execution in Slots down the 
screen over Time. Each Time Slot describes pipeline operations that are completed concurrently. 


The display has two cursor bars. The vertical bar highlights the instructions executing in active 
Time Slot. The horizontal bar is a marker that highlights the assembly instruction that is 
considered to be executing. The information on the Status Bar relates to the active Time Slot: 


Diagnosis describes the stall type encountered and what caused it. 


Cache describes memory operations that affect the cache when reading and 
writing data. 


System clock shows the total time taken for processor operations up to the current 
cursor position. 


The information generated by the Simulator can be printed, or saved in a file with the sim 
extension. For information on reading the printed or saved information see “Understanding the 
printed or saved results” on page 16. For information on how to print and save files see 
“Appendix B: Simulator’s Shortcut Menu” on page 27. 


SH4 pipeline operation display 


The Simulator evaluates each instruction’s functionality at the appropriate stage of the pipeline. 
For example, the instruction mov .L@r0, r3 tells the processor to read 32 bits from the address 
stored in RO, then put the results in R3. 


NOTE: Debugging in another region when viewing Simulator can be confusing. This is 
because the Simulator only reports complete Time Slot operations and does not 
show information about an instruction until the whole Time Slot is complete. 


Different colors describe the state of the processor, and mnemonics describe specific operations at 
each stage of the pipeline. An operation shown in: 


Black means the processor was OK. 
Red indicates the processor stalled. 
Blue indicates the processor missed the cache. 


Pink indicates the processor stalled and missed the cache. 
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Table 2: Instruction execution and pipeline operations 


The instruction mnemonic: Indicates this operation: 


IF Instruction fetch. 

if Dummy instruction fetch where external memory is not accessed. 
ID Instruction decoded/issued/register read. 

D Decode stage locked. 


Decode stage. (Register read only.) 


Instruction execution. 


SX Execution phase, the SX stage used. 


SX stage locked not used. 


Memory not accessed/no operation address. 


Memory accessed/operation address. 


Register write back (data stored to registers after operation). 


NA 
Floating point 0 stage accessed. (Special Stage inner product/transforms.) 
fl 


Floating point 1 stage accessed. 


Floating point 1 stage locked and not accessed. 


Floating point 1 stage partial usage (can overlap with other fl's but not F1). 


F2 Floating point 2 stage accessed. 
F3 Floating point 3 stage accessed. (Special Stage divide/square root). 
FS Floating point store/writeback. 


>FPSCR< Floating point status register updated. 


S 
ty 


In the sim file all of the instructions are represented by the mnemonics listed above 
except, >FPSCR< which is represented by FC. 
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Example Simulator display 


The screen shot shows 148 clock cycles occurred before the Register Usage Stall in the active 
Time Slot where: 


e Five instructions are processed concurrently. 


° — It takes 55 clock cycles to complete the Time Slot (see header of active column). 
A Time Slot’s completion time is determined by the parallel operation that takes 
the most time to complete. In the example it is the memory access. 


e — R15 causes a stall in the decode phase. 


e A data read missed the cache so a new cache line had to be fetched from memory. 
The new cache line fetch was delayed by a copy back already in progress. 


Figure 6: Register Usage Stall 
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Using the Simulator 


Most pipeline stalls can be avoided by re-ordering the sequence of assembly instructions. The 
Simulator highlights pipeline stalls helping you to decide where instructions can be re-ordered. 
It is recommended that you only run the Simulator on small sections of code. 


A typical method for using the Simulator is: 


1. 


10. 


11. 


12. 


Start CodeScape. 
On the Windows desktop, click the Start button, choose Programs and select 
CodeScape. 


Load a Program file. 
Click File, Load Program file and select the file you want to simulate. 


Create a Source region. 
Click Window, New Window then right-click in the region and select Source. 


Add a breakpoint at the point you want to start the simulation. 
Place the cursor where you want to start the simulation then click S| on the 
Breakpoint toolbar to add a start breakpoint. 


Add a breakpoint at the point you want to end the simulation. 
Place the cursor where you want to end the simulation then click ® on the 
Breakpoint toolbar to add an end breakpoint. 


Run your program to the start breakpoint. 
Right-click in the Source region, select Execution... and click Run. Run until 
the start breakpoint occurs. 


Start the Simulator. 
Click Tools, Simulate Processor. 


Run to the end breakpoint. 
Right-click in the Simulator, select Execution... and click Run. Run until the 
end breakpoint occurs. 


Analyze Results. You can print the pipeline results or save them to a file. 
Right-click in the Simulator and click Print... or Save to file... 


Close the Simulator. 
On the Simulator’s title bar click #. 


Based on the simulation results, edit or re-order your source code to optimize 
your program and improve pipeline operation. 


Simulate the optimized code. 
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Running the Simulator and Interpreting the Results 


Understanding the printed or saved results 


When you print or save a simulation any stalls are represented by different letters. Where a stall 
occurs the appropriate letter is shown at the bottom of the Time Slot that generated it. Control 
instruction stalls are only avoided by using an alternative instruction in place of a control 
instruction. Resource conflict stalls and same group stalls are avoided by re-ordering the Time 
Slot instructions. 


Table 3: How stalls appear in the Simulator’s printed or saved output 


Letter: Shows: 


oa A write back to the register when a memory access is incomplete. 
An instruction generated stall. 


Resource conflict. For example, one instruction trying to write to a register when another instruction 
is trying to read the same register. 


The SX stage of the instruction is locked. 


A floating point pipeline stall caused by multiple use of FO, F1, or F3 stages. 
=| 


One or more control group instructions being dual issued. 


An unknown stall type. (Error in the Simulator.) 


Function optimization example 


C/C++ function 


The C/C++ function calculates the cross product of two vectors then returns the magnitude. 


struct Vector { 
Cloak, ey; Vy we 
} | 


float CrossProduct (Vector *vl, Vector *v2, Vector *res) 
{ 

res->x = (vl—->y) * (v2->z)-(v2->y) * (v1->z) ; 

res->y (v2->x) * (v1->z) - (v1->x) * (v2->z) ; 

res->Zz (v1l->x) * (v2->y) - (v2->x) * (v1l->y) ; 


return fsqrt (res->x * res->x + res->y * res->y + res->z * res->Z); 


} 
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First hand-coded assembler version 


The first hand-coded assembler version of the C/C++ function code follows the algorithm. The 
code is written in assembler to reduce the number of memory reads/writes but does not address 
any of the pipeline problems. The simulation of the code is shown in figure 7 on page 19. The 


hand-coded assembler is shown in the right-hand column under Disassembly. 


-EXPORT _CrossProduct 


_CrossProduct: 

; Load Vectors 

fmov.s @r4,fr4 s w1L=Sx 
fmov.s ers, tr] j v2->x 
mov #4, x0 


fmov.s @(r0,r4), fr5; vl->y 
fmov.s @(r0;, 25, Lrey v2=>y 


mov #8, x0 
fmov.s @(r0,r4), fr6; vl->z 
fmov.s @(r0,r5, fr9; v2->z 


; calculate res->x 


fmov fr; fri 

fmul £r9, £2L; vVieSy * wa-S2 

fmov £48. FXO 

fmul fr6, f£r0; v2—>y * vi=>z 

fsub £0, £xl, (vl= oy * vw2-e2) = (v2->y * vil->z) 
fmov.s fr, €xr6 


; calculate res->y 


fmov ET, £22 

fmul £r6, £r2; v2->x * vl1l->z 

fmov fr4, fr0 

fmul fr9, fr0; vl->x * v2->z 

fsub fr0, fr2; (w2->x * vl->z) - (w1l->x * v2->2z) 
mov #4, x0 


fmov.s fr2, @(EO;, 6) 


; calculate res->z 


fmov fr4, fr3 

fmul £x@, £r3 7 vieSe * veR>y 

fmov Cri; EXD 

fmul fr5, fr0's v2-Se * vil-Sy 

fsub frO, fr3; (vl->x * v2->y) - (v2->x * vl1->y) 
mov #8, x0 


fmov.s fr3, @(r0, r6) 


; calculate the magnitude 


f1di0 fr0 

Fipr fv0O, fv0O; (res->x*res->x)+(res->y*res-—>y) + (res->z*res->z) 
fsqrt fxr3 ; sqrt 

res 

fmov fr3, fr0; return result in fr0 with Time Slot operation. 


Running the Simulator and Interpreting the Results 


Simulator output 


The three example simulations show how the function is examined, then optimized through two 
iterations using information generated by the Simulator to re-order the code. It is assumed that all 
memory accesses hit the cache and take 1 clock cycle, and no registers are saved or restored as 
part of the function call. No instructions have been removed or changed. Performance 
improvements are due to changing the instruction execution order. 


NOTE: The assembler for each simulation is in the right-hand column under Disassembly. 


First simulation See Figure 7 on page 19 


In the first simulation, the function has Control Group Instruction stalls, Same Group stalls, and 
Resource Usage stalls. The: 


¢ Control Group Instruction stalls are due to the JSR and RTS and cannot be 
avoided without removing the function call. 


* Same Group stalls are due to the code reading all input data at the top of the 
program. 


¢ — Resource Usage stalls are due to the linear nature of the calculation. 
Second simulation See Figure 8 on page 20 
In the second simulation, time is saved by: 


¢ Delaying the output vector save until the magnitude calculation starts. The two 
large resource dependency stalls waiting for the result of the inner product and 
the square root free the stages needed to save the vector to memory. 


¢ — Interleaving the input vector loads with the calculation so more operations dual 
issue. 


Third simulation See Figure 9 on page 21 


In the third simulation, time is saved by reducing the resource dependency stalls between the 
FMUL and FSUB, and the FMOV and FMUL. 


The third version of the function shows a time improvement of about 20%. This can make a 
substantial difference to your program if the function is called many thousands of times. 


NOTE: The code can be optimized even further if the code is changed to re-use FR4-FR9 
instead of FRO and if FNEG and FMAC are used instead of FMUL and FSUB. 
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Running the Simulator and Interpreting the Results 


SH4 memory model 


The previous examples assumed that all operations took 1 clock cycle. For memory accesses this 
is not true. Access (read/write) to external memory takes several cycles to complete. The 
Simulator mimics many of the features present on the SH4, it models: 


e Cached and uncached memory including cache update methods. Examples are: 
write-back, copy-back, RAM mode. 


e The store queues. 

e — Data bus access size (byte, word, long, double and cache line). 

¢ Instructions that operate on cache memory such as PREF, MOVCA, OCBI. 
To configure this model the Simulator reads information from two sources. It reads: 


¢ On startup, the cache control register and the processor cache memory 
(instruction and data caches). These are used to initialize its internal 
representation of the cache. 


¢ The user specified timing information and valid memory areas from a 
configuration file (see Figure 10). 


Figure 10: Example DASH4 memory configuration file 


WRITE READ 
Byte Word Long Quad Cache 
& & % 7 28 


Byte Word Lo 
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CodeScape Pipeline Simulator 


Memory configuration file: valid memory section 


The valid memory section of the configuration file describes specific areas of memory to the 
Simulator. 


The available fields are: 


Supported access method. 
For example, Read only (Read), Write only (Write), or Read and Write 
(ReadWrite). 


Start and end address of the memory section being defined. 


Access size (BYTE, WORD, LONG). 
This tells CodeScape how to request memory from the target for this memory 
area. Access size is not used by the Simulator. 


Access Restrictions (NoRestrictions or SimulatorHardwarePort). 
Defines the part of CodeScape, or the part of the Simulator, that can see or access 
this memory area. 


Timing information. 

Defines the access time for the Simulator when accessing this memory area with 
the set access size. A value of -1 means that the access type is not possible and 
generates an error if attempted. For example, cache accesses to the P2 and P4 
memory spaces are marked "-1". 


CAUTION: Do not change the default values in the memory configuration file. 


NOTE: 


Optimizing cache using the PREF instruction can substantially improve 
performance. 
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Appendix A: Status Bar 
Messages 


The messages on the Simulator’s status bar provide stall and cache operation information about 
the current instruction. 


Stall messages are: 


Register Usage Stall. 

SX Stage Stall. 

Floating Point Pipeline Stall. 

CO Group Dispatch Failure Stall. 

Same Group Dispatch Failure Stall. 
Memory Access/Instruction Fetch Stall. 
Write Back/Register Usage Stall. 
Multiplier Usage Stall. 


Instruction Generated Stall. 


Memory Access/Memory Access Stall. 
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Appendix A: Status Bar Messages 


Cache operation messages are: 
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Write Miss. 

Read Miss, Line Fetched. 
Data Write Miss. 

Data Read Miss. 

Data Write Hit. 

Data Read Hit. 
Instruction Read Miss. 
Store Queue Write. 
Block. 

Copy-Back. 

Line Fetched. 
Write-Back. 
Write-Through. 
Allocation. 

Purge. 

Invalidate. 

Write-Back. 

Prefetch in Progress. 
Copy-Back in Progress. 


Write-Back in Progress. 


Appendix B: Simulator’s 
Shortcut Menu 


Table 4: The shortcut menu in the Simulator 


Select: 


To: 
Show Stall Type Show the type of stall generated: “R>”, “R”, “c>”. 
it executes a specified address. Run all of your programs simultaneously. 


Properties... Configure fonts and colors. Set the update rate for a single region, a region type, 
and each processor. Change the tab settings. 


ras 


Appendix B: Simulator’s Shortcut Menu 
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Appendix C: CodeScape’s 
Debug Menu 


CodeScape’s regular debugging functions are available when the Simulator is running. The 
debugging functions include commands for: controlling program execution, stepping code, 
using breakpoints, and setting the cursor to the PC and vice versa. 


When you single step in a Simulator region the cursor is shown at the instruction currently 
executing in the pipeline. During simulation the PC fetches instructions ahead of the current 
instruction (when you single step in any other CodeScape region, the cursor is shown at the PC). 
Some instructions are not executed because of changes in the program flow. For example, 
instructions fetched after a branch. 


NOTE: You cannot run the Profiler and the Simulator at the same time. 
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Appendix C: CodeScape'’s Debug Menu 


Execute a program 


In a Source or Disassembly region, run your program in one of the following ways: 


On the Debug menu, click Run to run your program (F9). 


On the Debug menu, point to Execution then click Run All to run all of your 
programs simultaneously (CTRL+F9). 


On the Debug menu, point to Execution then click Run to Cursor to run your 
program to the cursor position (ALT+F9). 


On the Debug menu, point to Execution then click Run to Address... Type an 
address in the Expression field of the Run to Address/Instructions dialog to run 
your program until it executes a specified address (SHIFT+F9). 


Step into (trace) code 


In a Source or Disassembly region, step into your code in one of the following ways: 


Click Debug, click Step to single step a line of code (F7). 


On the Debug menu, click Forced Step to step a line of source code at the 
disassembly level (SHIFT+F7). 


On the Debug menu, click Step Over to step over a line of source code (F8). 


Add a breakpoint 


In a Source or Disassembly region: 


1. Click Debug, click Goto Address. Enter an expression address to go to, click OK. 
2. On the Breakpoint toolbar, click SI to set a breakpoint. 
3. On the Debug menu, click Execution, click Run to Address... Your program will 
run until the breakpoint is reached. 
30 
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Configure a breakpoint 


CodeScape enables breakpoint configuration including data accesses within memory ranges and 
breakpoints on external peripheral devices. 


To configure a breakpoint: 


* — Click Debug, point to Breakpoints then click Configure Breakpoint(s).. 
(CTRL+F5). 
The Configure Breakpoint(s) dialog appears. Specify any options you require. For 
more information on using the dialog, refer to the online help supplied with your 
version of CodeScape. 
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Appendix C: CodeScape’s Debug Menu 
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